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ABSTRACT 

During cosmic reionization, the 21-cm brightness fluctuations were highly non- 
Gaussian, and complementary statistics can be extracted from the distribution of 
pixel brightness temperatures that are not derivable from the 21-cm power spectrum. 
One such statistic is the 21-cm difference PDF, the probability distribution function of 
the difference in the 21-cm brightness temperatures between two points, as a function 
of the distance between the points. Guided by 21-cm difference PDFs extracted from 
simulations, we perform a maximum likelihood analysis on mock observational data, 
and analyze the ability of present and future low-frequency radio array experiments 
to estimate the shape of the 21-cm difference PDF, and measure the history of cosmic 
reionization. We find that one-year data with an experiment such as the Murchison 
Wide-field Array should suffice for probing large scales during the mid-to-late stages of 
reionization, while a second-generation experiment should yield detailed measurements 
over a wide range of scales during most of the reionization era. 
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1 INTRODUCTION 

In the coming decade, low-frequency radio arrays will 
begin to probe the epoch of reionization via the red- 
shifted 21-cm hydrogen line. Current observational ef- 
forts include the MWA (Murchi son Wide-field Array) 
l|Bowman. Morales fe Hewitt] |2009l ). LOFAR (Low Fre- 
quencv Array) (|Harker et al.ll2010l ), PAPER ( Precision Ar- 
ray f or Probing the Epoch of Reionization) (Ijacobs et alj 
1201 ll). and the GM RT (Giant Metrewave Radio Telescope) 
( Paciga et al.ll201ll ). Successful interpretation of these obser- 
vations will require effective statistical techniques for ana- 
lyzing the data. Due to the difficulty of these measurements, 
it is important to develop techniques beyond the standard 
power spectrum analysis, in order to offer independent con- 
firmation of the reionization history, probe different aspects 
of the topology of reionization, and do this with methods 
subject to different systematic errors. 

During reionization, the hydrogen distribution is 
a highly non-linear function of the distribution of the 
underlying ionizing sources. A natural statistic for probing 
the expected non-Gaussianity is the one-point probability 
distribution function (PDF) of the 21-cm brightness tem- 
perat ur e at a point ijFurlanetto. Zaldarri aga fe Hernq uist| 
2004 IcTardi fe Madaul 120031: iMellema et al.l l200rj; 



Wvithe fe Morales! l2007l : lHarker et al.ll2009l : llchikawa et ail 



2010; Gl uscevic fc BarkanalboiOl ). In this paper, we focus 
on the PDF pa(AT(,) of the difference in 21-cm brightness 
temperature between two points in the cosmological volume, 
A T 6 = |T 2 - Ti|. This 2 1-cm difference PDF, suggested 
by iBarkana fc Loebl l|2008l) . is a two-dimensional function, 
dependent not only on AT], but also on the separation r 
between the two points at which the difference in brightness 
temperature is measured. 

There are at least three advantages to the 21 -cm dif- 
ference PDF statistics (|Gluscevic fc Barkana|[2O10l ). which 
we summarize here. Firstly, if the number of resolved cubic 
pixels (i.e., voxels) in the observed volume is N, the num- 
ber of data points available for reconstructing the one-point 
PDF is only N, whereas the number of data points available 
for reconstructing the difference PDF (N 2 /2) is overwhelm- 
ingly larger, albeit the latter data points must be sorted 
into bins of distance r. Thus, we might expect to do better 
than with the one-point PDF, which requires rather strong 
assumptions in order to allow a reconstructi on of the reion- 
izatio n history with upcoming experiments (Ichikawa et al. 
|2010|) . Secondly, the 21-cm difference PDF generalizes both 
the one-point PDF and the two-point correlation function 
of Tb (the latter of which can be deduced using the vari- 
ance of the difference PDF, and is equivalent to the power 
spectrum), and also yields additional information beyond 
those statistics. Thirdly, the difference PDF avoids (by its 
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very definition) the unwanted contribution of the mean sky 
background temperature, and is readily applicable to tem- 
perature differences measured with radio interferometry. 



2 METHODOLOGY 

We adopt the expected parameters for f-year observa- 
tions of a single field of view with MWA, using equa- 
tions for 21-cm interferometer arrays from the review by 
iFurlanetto. Oh fc Briggsl l|2006l ). with an integration time of 
tint = fOOO hours, a collecting area of A to t ~ 2 x 10 3 m 2 , a 
field of view of irl 6 2 deg 2 and a total bandwidth of Au to t = 6 
Mhz. Note that the collecting area here is 4 times smalle r 
than the collecting area assumed in llchikawa etal] (|2010l ). 
given the scaling down of the first generation of the MWA 
compared to earlier plans. Then, assuming cubic pixels of 
Size T"com (all distances comoving), we find the following ex- 
pected number of voxels N p and root-mean-square noise in 
each one cjn'- 



com \ / i I 2 



N > = 8 ' 2 x 10? USmw v 9 



(i) 

(2) 



In order to look a bit ahead, we also consider speci- 
fications with lower noise in the same field of view, e.g., 
1/2 the noise we denote as MWA/2 (which corresponds to 
4-year data with the MWA), while 1/10 the noise we de- 
note as MWA/10; the latter is a conservative estimate (by 
at least a factor of a few) for larger, second generation 21- 
cm arrays such as the SKA (Square Kilometer Array) or a 
5000-antenna MWA. The only source of noise we consider is 
Gaussian thermal noise, whose magnitude is determined by 
the receiver's system temperature, which is set by the sky's 
brightnes s temperature dominated by G alactic synchrotron 
emission l|Furlanetto. Oh fc Briggsll2006l ) . This assumes per- 
fect foreground removal from 21-cm maps. Clearly, the first 
step for any proposed measurement method is to prove its 
feasibility against thermal noise, which can then motivate 
more detailed analyses that include a larger range of obser- 
vational difficulties and sources of noise. 



2.1 Model PDF and Thermal Noise 

We begin by considering a general PDF, which could be the 
regular (one-point) PDF or the difference PDF. To deter- 
mine a best-fit PDF using observed data, we characterize 
the PDF with a finite number of parameters. We do so with 
a binned PDF, expressed as the sum of boxcar functions for 
each bin p(x) — Fi(x), where 



if ai < x < a i+ i 
otherwise. 



(3) 



Here is the bin height, and the probability contained in 
bin i is pi = h(a,i+i — ai). In a model with bins where the 
bin edges {ai, a2, . . . , ajv^+i} are fixed, the binned PDF only 
has At — 1 free parameters {h, h, ■ ■ ■ , lN b -i}, as Zjv 6 must be 
normalized such that the probabilities sum up to unity. Note 
that throughout this paper we use this very general binned 
form for the PDF, and do not need to assume a particular 



functional form for the difference PDF as is necessary for 
the one-point PDF gi ven its much lower signal-to-noise ratio 
l|lchikawa et alJl201Ch . 

Theoretically, the measured PDF will be the true PDF 
convolved with the noise. For example, the 21-cm brightness 
temperature one-point PDF measured by instruments will 
be the true one-point PDF p(T) convolved with an extremely 
broad normal distribution: 



N(0,a 2 N ) = 



1 



(4) 



'2-7T CTJV 

with zero mean and standard deviation ajv due to thermal 
noise (Equation ©). Hence, the noisy one-point PDF can 
be expressed as a sum of the convolution of boxcar functions 
with the Gaussian, 



Pnoi Sy (T)=^F l (T)*A(0, ( 7 2 ,) , 



(5) 



where 



' T 



T 



fli+l 



^/2 



Fi(T)*N(0,o%) = -k Erf ( ) - Erf 

\ on . _ 
(6) 

Here Erf(x) is the error function. p no isy(T) can be used to 
generate mock observations of the one-point PDF. 

Out of symmetry and conven ience, the difference PD F 
Pa(AT(,) defined in the literature (jBarkana fc Loebll2008l ) is 
a function of the absolute difference in brightness tempera- 
ture AT;, = | Tb — Ti | . However, to find the effect of thermal 
noise on the difference PDF pA,noisy(AT{,), it is easier to first 
find the probability distribution function pa. noisy (AT21) as 
a function of the temperature difference AT21 = T2 — Ti 
without the absolute value. If the thermal noise at any two 
points 1 and 2 is uncorrelated, then their temperature differ- 
ence has a root-mean-square thermal noise of V2 crjv. If the 
intrinsic difference PDF is pa(AT2i), the observed version 
is then: 

PA, noisy (AT21) 

= PA(AT 2 i)*A(0,2 C ri r ) 
= ^2Fi(AT 2 i)*N{0,2a 2 N ) 



El* 



Erf 



AT 2 



2a n 



Erf ( 



AT 2; 



2a n 



(•7) 



Now, due to symmetry, PA.noisy(AT2i) is an even function, so 
we can recover the difference PDF as defined for the absolute 
temperature differences, with thermal noise included, via 



PA, noisy (ATi,) = 



2 PA, noisy (AT21) 





if AT21 > 
otherwise . 



(8) 



To use equation ((7}, we need to assume a true differ- 
ence PDF, with which to convolve the thermal noise. To 
this end, we use the binned difference PDF as mea sured 
in the fiducial 51 simulation of lMcQuinn et all {2007), who 
modeled the density field during the epoch of reionization 
with a 1024 3 N-body simulation in a box size of w 94 Mpc, 
post-processing it using a suite of radiative-transfer simu- 
lations to characterize the morphology and size distribution 
of ionized regions during reionization. Analytic prescriptions 
were used to model reionization effects of small-scale struc- 
ture that was unresolved in the N-body simulation. Source 
parameters were chosen so that reionization ends near 2 — 7 
in the simulation. 
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The 21-cm difference PDF's from these simula- 
tions, shown in F i gure [TJ were first presented by 
Gl USCCV1C fc Barkanal (|20ld ): here we use the same redshift 
slices (taken at 50 Myr intervals), the same cubic voxel 
size of 2.9 (comoving) Mpc, and the same logarithmically 
spaced distance bins to obtain the 'true' difference PDF 
from the simulation. The central values of the logarithmic 
distance bins are r m id = 4.3, 8.3, 16.2, 31.5, 61.4, and 119.5 
Mpc. However, inste ad of using the same 20 linea rly spaced 
temperature bins as iGluscevic fc Barkanal ( |2010h . we alter- 
nate the number and interval size of our temperature bins 
{ai, a2, • • • , ajvj+i}, to see the dependence of the fit errors 
on the number of free parameters. In general, reducing the 
number of bins gives a more accurate determination of the 
PDF, but at the price of less detailed information on its 
shape, since the measured PDF is (at best) the true one but 
smoothed on the scale of the bin size. 

Mock observational difference PDF values can be cre- 
ated by randomly generating n values of AT], using the dis- 
tribution pA,noisy(ATi,). The same functional form can be 
employed in finding the best fit parameters {k} to the same 
mock observations with a maximum likelihood method. In 
this paper, we sample each difference PDF and generate 
1000 Monte Carlo instances of observational data for that 
model, and thus obtain a well-sampled distribution of recon- 
structed model parameters. 

2.2 Number of Voxel Pairs 

For a given observational volume, voxel size, and distance r 
between pairs of voxels, the number of voxel-pairs N(r)dr 
(at distances between r and r + dr) is uniquely determined. 
Thus, for each bin we sample n — J N(r)dr values of ATf, 
from pa, noisy (AT 6 ) as our mock data. 

With the edge size of each voxel normalized to 1, the 
number of voxel-pairs in a cubic volume V = L 3 as a func- 
tion of the voxel distance r can be closely approximately by 
the number of voxel-pairs in a sphere of the same volume 
V = 4nR 3 /3 at the same distance, at least for small r. A 
sphere is easier to analyze, and yields an analytical result 
for the voxel-pair distribution function: 

N(r)dr = -ir 2 r 2 (2R- r) 2 (4R + r)dr . (9) 
6 

Equation <(9j is exact for spherical volumes for all r £ 
(0, 2R), in the limit of infinitesimal dr and voxel size (com- 
pared to r and R). The total number of voxel pairs at all r 
in a sphere of radius R is 

J™ N(r)dr = ±(j i irR 3 ') , (10) 

which is \N 2 as expected, in terms of the total number of 
voxels TV. 

If the distance between voxel-pairs is much less than 
the characteristic size of the observable volume, then the 
total number of voxel-pairs at that small distance will not 
be sensitive to the shape of the volume, but only to its size. 
In considering such a pair, voxel #1 can be chosen anywhere 
within the volume, and #2 must then be at a distance r in 
any direction. As long as r <C L (or r <C R), the full sphere 
of radius r about # 1 will almost always fall within the 
big volume, so that all #2 voxels on this small sphere are 
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Table 1. The number of voxel-pairs J N(r)dr in each distance 
bin, as a function of redshift. The distance bins are logarithmi- 
cally spaced and are denoted via their central values r m i^, in 
units of (comoving) Mp c. The size of each voxel is chosen to be 
2.9 Mpc, consistent with Glusccvic & Barkana (2010), from which 
the simulated difference PDFs were taken. Due to equation[T] the 
total number of voxels (and thus N(r)) has a slight redshift de- 
pendence. Higher separation bins r m id have orders of magnitude 
more voxel-pairs compared to lower r m i(j, varying roughly as r 3 
when r -C L (since the bin width oc r). 
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allowed. Thus, the number of pairs will be the number of #1 
voxels V times the number of #2 voxels 47rr 2 dr, divided 
by 2 for double-counting of pairs. The result of 2nVr 2 dr 
agrees with equation © in the small-r limit. Corrections to 
this result will come from cases where pixel #1 is within a 
distance r of the volume's boundary, i.e., the correction is of 
order the surface area times r divided by the volume, which 
(for simply-connected convex volumes such as a sphere or 
cube) is of order r/V 1 ^ 3 . 

For the MWA, if we assume that the volume it will ob- 
serve on the sky is approximately cubic, equation ([1} implies 
that the length of the cube L ~ 1000 Mpc for all redshifts 
of interest. This is much greater than the largest distance 
(r ~ 163 Mpc) between voxel-pairs we consider in this pa- 
per. Since L > r, we use equation © as an excellent proxy 
for the number of voxel-pairs MWA will observe, and list 
the values of J N(r)dr in each distance bin in Table Q] 



2.3 Maximum Likelihood for a Multinomial 
Distribution 

Since n is large, to compare the mock observational data to 
a potential model, we also bin the observational data into 
TVs bins (in general different from the number of bins TV;, in 
the model). Note that a binned PDF is essentially a multi- 
nomial distribution of the variable X given by the set of bin 
probabilities p = (pi, . . . ,Pn b )\ given n total data points, 
there will be an expected number of nexp.j = n Pj data 
points in bin j. As for the covariance matrix E, the variance 
of X in a single bin j is Pj(l — pj), while the covariance 
between different bins i, j is —pipj. How does one account 
for the covariance structure of a multinomial distribution in 
a maximum likelihood estimate (MLE) fit? 

In the limit of large n, the multinomial distribution 
is approximated by the multivariate normal distribution 
with the same mean p and covariance E. We apply MLE 
to this multivariate normal with model parameters p* = 
(pi, . . . ,Pjv b -i), where we drop the last bin pn b because 
it is not an independent variable due to normalization con- 
straints, and none of the elements in E are free variables as 
they are completely determined by p*. We thus find that 
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Figure 1. 21-cm difference PDFs from lGluscevic fc Barkanal ^Old ) are shown here as a function of their distance bin and redshift. The 
legend in the first panel indicates the central values of the logarithmic distance bins. 



the objective function to be minimized is: 



(Alf(E*)^(AX*), 



where X* refers to the first Nb — 1 bins, so that the vector 



AX* = X* - p* 
= (XI -Pi, 



, Xt> 



Pn b -i) 



(12) 



is the deviation between the observed probabilities X* and 
model probabilities p*. Note that by definition Xj = rij/n, 
where rij is the actual number of data points observed in bin 
j. Similarly, E* is the covariance matrix of X* , and is equal 
to the upper-left (Nb — 1) X (Nb — 1) submatrix of E. Note 
that this approach correctly accounts for the constraint of 
the total probability summing up to unity. 



E* is indeed invertible, and takes the form: 



(ii) (E*) _1 = 



PI PN B PN B 

Pl¥„ P2 P»„ 



PJV F 
1 



V 



1_ 

PNp 



1_ 

PN F 



+ 



PNn-1 PNp 



Thus, we find that the objective function 
(AX*) T (E*)~ 1 (AX*) 
Xi Xn b 



Xn b _ 1 Xfr 



Pi Pn e 



Pn e 



Pn e 



(AX' 



= E 

3=1 



(Xi-pj? 



Pj 



n. 



3=1 
1 2 

= -x , 

n 



nexp,j 



(13) 



(14) 
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where \ 2 is the standard Pearson's chi-squared statistic. In 
summary, to find the best MLE fit for a multinomial distri- 
bution of Nb components with Nb — 1 free parameters, one 
can simply minimize a standard x 2 statistic in which all Nb 
terms in the \ 2 are summed over. 

We bin the values from the mock observation data into 
Nb = 1,000 bins; this is justified as long as the bin width 
is much smaller than any scale we hope to resolve in the 
observed PDF. We leave the last bin at the tail of the ob- 
served difference PDF with a much wider width than the 
other bins, so that each bin has more than 10 counts (with 
most having a vastly larger count num ber), and we do not 
have to use the C-statistic (Cash 1979) instead to account 
for large relative errors at small counts. 



3 RESULTS 
3.1 1-bin model 

The attempt to measure the difference PDF is essentially a 
contest between a very high level of noise per measurement 
(almost three orders of magnitude larger than the width of 
the intrinsic difference PDF in Figure [1} and a very large 
number of measurements. A naive signal-to-noise estimate 
may suggest that only ~ 10 6 measurements (i.e., the square 
of the noise-to-signal ratio of each measurement) are needed 
for a rough measurement of a given difference PDF value 
(in some bin of temperature difference). In reality, though, 
the needed number is significantly higher, because of the 
near-degeneracy that is encountered in what is essentially 
an attempt to decon volve the noisy difference PDF (see 
Ichi kawa et al. (2010) for a detailed discussion in the con- 
text of the one-point PDF). Thus, it is prudent to start out 
conservatively, and try to fit a small number of bins. 

Luckily, theory suggests that even a 1-bin model is 
worth considering, since it can yield valuable information. 
Such a model consists of a single bin at ATt ~ 0, plus a 
normalization bin at higher values, of ATt,. The value of the 
total probability pi of the difference PDF in this first bin 
can be used as an approximation of APd, which refers to 
the theoretical limit of a comp onent of AP which is a Dira c 
Delta function at ATb = l|Gluscevic fc Barkanal feoiO). 
This APd effectively measures a low-resolution version of 
the i onization correlation fu nction during cosmic reioniza- 
tion (|Barkana &: Loebll2008t ) ; in the limit of perfect resolu- 
tion, APd would exactly correspond to the joint ionization 
probability of two points as a function of their distance r. 
In this limit, at r — >■ 0, APd should simply equal the prob- 
ability of having Tj, = mK, which is the mean ionized 
fraction Xi, and APd should decrease with increasing r as 
the pair-correlation drops, until APd ~ x 2 at r — > oo (for 
which each voxel in the pair is ionized independently). As 
long as the ionized regions maintain a low (even if non-zero) 
neutral fraction, this description should be approximately 
valid even for the realistic difference PDF. 

We begin with a 1-bin model consisting of a first bin be- 
tween AT = to 4 mK, plus a normalization bin at AT = 4 
to 40 mK. Figure [2] illustrates the main results for this one- 
parameter model with one-year MWA noise. Even with this 
relatively large noise, the high number of voxel pairs in the 
observational volume of MWA drives the finite sampling 
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Figure 2. Measured value and error of the total probability pi in 
the first bin (AT = — 4 mK) of a 1-bin model of the difference 
PDF. We show this as a function of redshift, assuming MWA 
noise, for various voxel-pair distance bins: = 4.3, 8.3, 16.2, 

31.5, 61.4, and 119.5 comoving Mpc (panels from top to bottom). 
We compare in each case the true value (green square), the mean 
fit value (red dot), and 16-84 percentile values (red error bars) 
based on 1,000 instances of mock observation data; we refer to 
these percentile values as the ±lcr range in the rest of this paper. 
We also show the ±lc errors magnified by a factor of 10 (blue 
error bars), so that smaller errors are easier to see. 



noise to quite low levels, in many cases allowing us to over- 
come the degeneracy in the reconstruction. The value of the 
difference PDF in the first bin can be measured with an ac- 
curacy of a few percent for the larger voxel-pair distances 
r'mid = 61.4 and 119.5 Mpc, with some useful measurements 
possible at lower radii. The measurements are particularly 
advantageous at low redshift, where the high ionization frac- 
tion produces a large APd feature in the intrinsic difference 
PDF (Figure [TJ; this makes the measurement easier, and 
also makes the measured value more closely related to the 
ionization fraction. Indeed, the strong rise with time in the 
measured 1-bin pi, occurring simultaneously at all r bins, 
constitutes a clear detection of the end of reionization (at 
z = 6.9 in this case). 

As shown in Figure decreasing the noise by a factor of 
2 (MWA/2) often decreases the errors in the reconstructed 
difference PDF by a factor of 3—4, yielding some information 
even at the lowest values of r. Since values of r ~ 10 Mpc are 
required to see the variation of pa with distance (Figure [TJ, 
this lower noise would allow us to get an indication of the 
average size of ionized bubbles, above which the correlation 
strength (and pi ) drops. 

Figures [3] [S] and [5] give a more complete indication of 
how the fit error varies with the thermal noise, at the end, 
middle, and beginning of the epoch of reionization, respec- 
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Figure 3. Same as Figure[2] but for MWA/2 noise, corresponding 
to 4 years of observations with the MWA. 



Noise Improvement Factor vs MWA 

Figure 4. Relative error of the measured pi in the first bin (AT = 
— 4 mK) of the 1-bin model as a function of noise at z = 6.9, 
when the mean ionization fraction was Xi = 0.98. Here the noise 
improvement factor is the factor by which the thermal noise is 
reduced compared to 1-year MWA observations (equation i2i). 
The relative error is defined as the range between the ±lcr values 
divided by the true value. 



tively. Again, we find that the larger distance bins have 
smaller errors because they have far more voxel-pairs, re- 
ducing the sampling error. Above some high level of noise, 
the degeneracy is complete and the fit error is of order unity 
regardless of the noise per pixel on (note that the probabil- 
ity within a bin is limited to vary between and 1). How- 
ever, below some critical value (which varies with r due to 
the different numbers of voxel-pairs), the fit error begins to 
decrease as ctjv decreases. This decrease is faster than linear 
(typically close to quadratic) since reduced noise removes 
some of the partial degeneracy involved in the effective de- 
convolution of pa - Note that the low noise levels we consider 
can correspond either to multi-year observations with the 
MWA or to future radio arrays with a larger collecting area. 

At z — 6.9, the MWA suffices to measure the 1-bin dif- 
ference PDF with < 10% errors down to the r = 16 Mpc bin, 
and thus verify the signature of the end of reionization. At 
the larger separations, the MWA can determine pa to bet- 
ter than a percent. Higher redshifts are more challenging, so 
that at z = 8.2 (near the midpoint of reionization), < 10% 
errors are possible with the MWA only at the two highest 
separations, MWA/2 gets down to ~ 30 Mpc, and MWA/10 
allows measurements at the full range of separations. Early 
in reionization (z = 10.1), when the measurement noise is 
larger due to the higher redshift, and the difference PDF 
is still close to the Gaussian shape driven by density fluc- 
tuations, the MWA can only attempt to measure the long- 
separation limit of pa, but a second-generation experiment 
should still be able to probe a broad range of distances. 

Alternatively, we can use a 1-bin model where the first 
bin is smaller, between AT = to 1 mK. This pinpoints 
the fraction of voxel-pairs with ATS, ~ (which approxi- 
mates APd) more accurately. Thus, pi is significantly lower 
than with the wider first bin consider before, except at the 
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Figure 5. Same as Figure|4]but at z = 8.2, when Xi = 0.44. 



very end of reionization. The measurement errors are gen- 
erally similar to before, as illustrated for MWA/2 noise in 
Figure which can be compared to Figure [3] from before. 
Figure [8] uses a different presentation to show more directly 
how with second-generation radio arrays we can accurately 
map APd as a function of r across different redshifts, with 
the location of the flat asymptote of pi telling us where r 
drops below the correlation length, which can be used to 
determine the average size of ionized bubbles. We note that 
one conservative way to begin investigating the shape of the 
difference PDF is to fit 1-bin models with various bin widths 
and compare the results. However, a more direct method is 
to use models with more bins, which we consider next. 
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Figure 6. Same as Figure[4]but at Z — 10.1, when xi = 0.13. 
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Figure 7. Measured value and error of the total probability p\ in 
the first bin of a 1-bin model of the difference PDF. Same setup 
and notation as in Figure[2] but with MWA/2 noise as in Figurc[3] 
and here using a model with a narrower first bin (AT = — 1 
mK). 



3.2 10-bin model 

Given that the 1-bin model is expected to yield accurate 
measurements of the difference PDF, even better than 1% 
measurements in some cases, we now consider a more ambi- 
tious attempt to measure the detailed shape of the difference 
PDF. We consider a 10-bin model consisting of 10 equal-size 
bins between AT = and 10 mK, plus a normalization bin 
at AT = 10 to 40 mK. Of course, many other binning choices 
are possible, including redshift-dependent binning, but our 




r [comoving Mpc] 



Figure 8. Measured value and error of the total probability pi 
in the first bin (AT = ~ 1 mK) of a narrow 1-bin model of 
PA, shown as a function of voxel-pair distance (bin center), with 
MWA/10 noise. 



choice should suffice to determine whether the shape of the 
PDF can be determined with 1 mK bins over a range where 
there is interesting dependence on AT throughout the reion- 
ization era. 

Near the end of reionization, we illustrate the expected 
reconstruction accuracy in three separation bins, r m id = 16.2 
(Figure HJ), 61.4 (Figure [TDJ and 119.5 Mpc (Figure [TTJ. 
While the theoretical difference PDF should have a simple 
shape, with nearly all the probability concentrated in the 
first bin, it would be exciting to directly verify this observa- 
tionally. At 16.2 Mpc, the error in the first bin is very large, 
and there are strong degeneracies among the various bins, 
as illustrated by the failure of the fitting errors to decrease 
in going from MWA to MWA/2 errors. The degeneracy is 
broken, however, with MWA/10 errors, in which case the 
expected shape of j>a can be precisely verified. We note that 
the r m id = 8.3 Mpc bin (not shown) shows similar recon- 
struction errors to the 16.2 Mpc case (except that the er- 
rors are larger by a factor of a few for MWA/10). At the 
two largest-separation bins, pa can be reasonably measured 
with MWA/2 (for r mid = 61.4 Mpc) or even with MWA 
errors (for the highest r bin). 

Near the midpoint of reionization, we illustrate the re- 
sults for the same three separation bins, in Figures 1121 1131 
and 1 141 The results here are similar, in that useful measure- 
ments require MWA/10 errors for r mid = 16.2 Mpc, MWA/2 
for r m id = 61.4 Mpc, and just MWA at the highest separa- 
tion. Here again the r m id = 8.3 Mpc bin (not shown) is 
similar to the 16.2 Mpc case, except for significantly less 
accurate (though still useful) measurements for MWA/10. 

Finally, at a higher redshift when reionization is still 
in its early stages, measurements are more difficult so we 
show only the two highest distance bins, 61.4 (Figure I15J1 
and 119.5 Mpc (Figure HB). At this redshift, only MWA/10 
errors allow useful constraints on pa, in particular giving a 
reasonable measurement at 119.5 Mpc. 

We conclude that with the 10-bin model, one-year 
MWA observations can give a rough measurement of the 
difference PDF only at the highest separation, and during 
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Figure 9. Measured value and 1 — a error of p& in the 10-bin model, at z = 6.9, in the r m i(j = 16.2 Mpc bin, shown for three different 
levels of the thermal noise (as indicated in each panel). We compare the true, input difference PDF (black line) in 10 bins of width 1 mK 
(an extra normalization bin for at AT > 10 mK is not shown) to the mean value and 16 — 84 percentile range of the reconstructed p/\ 
based on fitting to 1,000 mock data sets. 
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Figure 10. Same as Figure[9]but for the r m i^ = 61.4 Mpc bin. 



(c) MWA/10 



mid-to-late reionization. Four-year observations would sub- 
stantially decrease the errors on the highest separation bin, 
and give some constraints on the lower, r m id = 61.4 Mpc 
bin. However, only with MWA/10 thermal noise, i.e., with 
next generation radio array experiments, will it be possible 
to recover the shape of the difference PDF across most dis- 
tances, and thus directly constrain the ionization correlation 
length (or bubble size) with few assumptions needed. 

With multiple parameters in the 10-bin model, the re- 
sults are driven by degeneracies between the probability fit 
values of various bins. We plot an example of this in Fig- 
ure [TTl for the first two bins out of the ten. Since it is hard 
to distinguish neighboring bins, a significant fraction of the 
values lie along a diagonal line that illustrates a strong pos- 
itive correlation. A few poor fits lie on the axes, since the fit 
parameters for each ATt bin denote probabilities and were 
constrained to non-negative values in the chi-squared min- 
imization. Thirty-three fit values (out of 1,000) lie outside 
the boundaries shown in the figure. 



4 CONCLUSIONS 

We have studied the expected errors in reconstructing the 
difference PDF of the 21-cm brightness temperature during 
cosmic reionization. We have shown how to perform a max- 
imum likelihood fit to a model of a binned difference PDF, 
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Bin 1 : p A [mK" 1 ] 

Figure 17. Fits of the first 2 bins in the 10-bin model, at z = 8.2, 
with r m i(j = 61.4 comoving Mpc, for MWA/2 noise. Bin 1 is 
AT = to 1 mK, while Bin 2 is AT = 1 to 2 mK. We color code 
the fit values for each of the 1,000 instances of mock observational 
data by the root-mean-square of the relative errors in Bin 1 and 
Bin 2. The mean fit and 1-cr errors of all 10 bins can be seen in 
Figure [l3bl 



and applied it to mock observational data for a realistic field 
of view and a range of thermal noise levels. 

Previous work shows that the difference PDF during 
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Figure 11. Same as Figure[9|but for the r m id = 119.5 Mpc bin. 
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Figure 12. Same as Figure O (i.e., r m j<j = 16.2 Mpc) but at z = 8.2. 
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Figure 14. Same as Figure |l2| but for the T m ^ = 119.5 Mpc bin. 
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Figure 15. Same as Figure llOl (i.e.. r mid = 61.4 Mpc) but at z = 10.1. 
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Figure 16. Same as Figure [T5l but for the r m id = 119.5 Mpc bin. 



reionization should display a strongly evolving shape that, 
if measured, can be used to probe both the mean ionization 
history and the typical size of the ionized bubbles. Early in 
reionization, the difference PDF still resembles the Gaus- 
sian shape driven by density fluctuations, but it later flat- 
tens and develops a two-peak structure, including a peak at 
ATb = due to jointly-ionized pixel pairs, and a peak at 
ATb ~ 10 — 15 mK due to the temperature difference be- 
tween an ionized pixel and one that is still mostly neutral. 
The difference PDF reaches an asymptotic form at large 
pair separations (where the two pixels in the pair are es- 
sentially independent); the distance below which it substan- 
tially changes shape is a measure of the correlation length 
(of density, early on, and mainly of ionization during the 
later stages of reionization). 

We have found that a conservative approach that at- 
tempts only to reconstruct the first bin of the difference 
PDF (which is related to the ionization correlation function) 
should yield highly accurate measurements. Even one-year 
MWA data should suffice for seeing the signature of the end 
of reionization in the difference PDF, and for probing large 
separations at earlier times. Four-year data should improve 
things markedly, typically decreasing errors by about a fac- 
tor of 4 (rather than the usual 2), since decreased noise helps 
remove some of the partial degeneracies inherent in what 
is essentially an attempt to deconvolve the noisy difference 
PDF. A second generation experiment should be able to 
probe a wide range of separations during most of the reion- 
ization era, assuming that reionization ends at z ~ 7 and 
not much earlier. We note that measuring the 1-bin model 



(i.e., a single parameter) is qualitatively similar in difficulty 
to measuring the correlation function (or equivalently the 
power spectrum), which is essentially equivalent to measur- 
ing a single number (the variance) from the difference PDF 
in each bin of separation distance. 

Given these results with a 1-bin model, we have also 
considered a more ambitious attempt to measure the de- 
tailed shape of the difference PDF over ten bins. We found 
that with the 10-bin model, one-year MWA observations can 
give a rough measurement of the difference PDF only at 
the highest separation, and during mid-to-late reionization. 
Four-year observations can give some improved constraints, 
but only a second-generation radio array will make it pos- 
sible to recover the detailed shape of the difference PDF 
across a range of distances and reionization stages. 

We note that while we have only gone up to the bin of 
separation distance centered at r — 120 Mpc, it would be 
useful to measure bins at even larger separations. Theoret- 
ically, pa should be pretty much constant with r at such 
large distances, since the two voxels of a pair are essentially 
independent of each other, but observationally the number 
of pairs keeps rising with distance. The total number of pairs 
in the MWA field of view, given the pixel size of 2.9 Mpc, 
is ~ 3 x 10 15 at z = 8. Thus, the number of pairs available 
for measuring the large-separation difference PDF is poten- 
tially 100 times larger than the value we assumed, which was 
already high compared to the available numbers at smaller 
separations. While a measurement of the large-separation 
difference PDF would not probe correlation functions, it 
would probe the cosmic mean ionized fraction and, essen- 
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tially, the one-point PDF (which independently describes 
each point of the pair) at an exquisite precision. 

Now that we have shown that measurements of the dif- 
ference PDF are quite promising relative to the expected 
thermal noise, the next challenge is to consider similar statis- 
tics in the presence of realistic foreground residuals and 
other systematic errors. In particular, systematic errors that 
vary across the field of view might make it in practice diffi- 
cult to include the just-mentioned wide-separation pairs. 
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